ORDER BY and Unicode - Mailing list pgsql-novice
From | M. Bastin |
---|---|
Subject | ORDER BY and Unicode |
Date | |
Msg-id | a06100515bcc7c7cdeb35@[192.168.0.101] Whole thread Raw |
In response to | pg_dump problem (Noel Faux <noel.faux@med.monash.edu.au>) |
Responses |
Re: ORDER BY and Unicode
|
List | pgsql-novice |
There seems to be a big problem with Unicode for which a solution might already exist. Somebody had the following problem on another mailing list. My suggestion is at the bottom of this message but if another solution already exists I'd like to hear about it. The problem is that special characters aren't treated right under Unicode. Here are a few examples: 1. "UPPER('é')" doesn't work. (That's an accented "e" in there if it doesn't come through in your e-mail application) The implication of this is that SELECT ... WHERE UPPER(mycolumn) LIKE UPPER('my search string') doesn't give the functionality you'd want. UPPER and LOWER seem to work on ASCII only. The Greek, French, etc. are out of luck. 2. "ORDER BY mycolumn" gives a wrong sort order. Uppercase ASCII characters come first, then lowercase ASCII, then accented characters... This really isn't what a human would like to see. I think the two examples above illustrate this Unicode problem quite well. Is there an existing solution? If not could we work together on creating one, as suggested at the very bottom of this message? Thanks, Marc ------------------------------------------------------ You can use the translate function to solve your problem. <http://www.postgresql.org/docs/7.4/interactive/functions-string.html> e.g. for the letter "a": SELECT * FROM mytable ORDER BY translate(textcolumn, 'àáâäÀÁÂÄa', 'AAAAAAAAA'); Then you build an index like this to speed things up: CREATE INDEX MyIndex ON MyTable (translate(textcolumn, 'àáâäÀÁÂÄa', 'AAAAAAAAA')); And your select queries will also be case and accent independent from then on e.g: SELECT * FROM mytable WHERE translate(textcolumn, 'àáâäÀÁÂÄa', 'AAAAAAAAA') LIKE translate('my search string', 'àáâäÀÁÂÄa', 'AAAAAAAAA'); Since the alphabet is very long you're better off creating your own function that does this translation for the whole alphabet and then you'd get something like this: SELECT * FROM mytable WHERE MySimpleABC(textcolumn) LIKE MySimpleABC('my search string') ORDER BY MySimpleABC(textcolumn); Your index would be like this: CREATE INDEX MyIndex ON MyTable (MySimpleABC(textcolumn)); Cheers, Marc PS: Maybe we should work together to create the mother of all functions that would do this for all of Unicode? Anybody else up to this?
pgsql-novice by date: